Variable selection in large environmental data sets using principal components analysis

1999 ◽  
Vol 10 (1) ◽  
pp. 67-77 ◽  
Author(s):  
Jacquelynne R. King ◽  
Donald A. Jackson
2019 ◽  
Vol 20 (1) ◽  
pp. 141
Author(s):  
Ildefonso Baldiris-Navarro ◽  
Juan Carlos Acosta-Jimenez ◽  
Angel Dario Gonzalez-Delgado ◽  
Alvaro Realpe-Jimenez ◽  
Juan Gabriel Fajardo-Cuadro

Coastal lagoons are one of the most threatened ecosystems in the world, because of population growth, habitat destruction, pollution, wastewater, overexploitation and invasive species which are the main causes of their degradation. The objective of this paper was to evaluate the water quality behavior in a stressed coastal lagoon in Cartagena, Colombian Caribbean. Environmental data was analyzed using hypothesis testing, confidence intervals, and also Principal components analysis (PCA). The study was focused on water parameters such as dissolved oxygen (DO), biochemical oxygen demand (BOD5), chemical oxygen demand (COD), salinity, pH, total dissolved solids, total coliforms (TC), Fecal coliforms (FC), ammonium (NH4+) and total phosphorus (TP). The analysis was conducted in line with the Colombian national water standard. Results showed that BOD5, COD, phosphorus, and coliforms are out of the limits for these variables in Colombia and are reaching levels that may be a threat to human health. Principal components analysis detected five components that explained 79.4% of the variance of data and showed that anthropogenic and temporal factors might be affecting the variation of the parameters.


2004 ◽  
Vol 12 (5) ◽  
pp. 36-39 ◽  
Author(s):  
Brent Neal ◽  
John C. Russ

Principal components analysis of multivariate data sets is a standard statistical method that was developed in the early halt or the 20th century. It provides researchers with a method for transforming their source data axes into a set of orthogonal principal axes and ranks. The rank for each axis in the principal set represents the significance of that axis as defined by the variance in the data along that axis. Thus, the first principal axis is the one with the greatest amount of scatter in the data and consequently the greatest amount of contrast and information, while the last principal axis represents the least amount of information.


2013 ◽  
Vol 7 (1) ◽  
pp. 19-24
Author(s):  
Kevin Blighe

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such ‘dimension reduction’ techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.


2017 ◽  
Author(s):  
Dean Hendrix

This study analyzed 2005–2006 Web of Science bibliometric data from institutions belonging to the Association of Research Libraries (ARL) and corresponding ARL statistics to find any associations between indicators from the two data sets. Principal components analysis on 36 variables from 103 universities revealed obvious associations between size-dependent variables, such as institution size, gross totals of library measures, and gross totals of articles and citations. However, size-independent library measures did not associate positively or negatively with any bibliometric indicator. More quantitative research must be done to authentically assess academic libraries’ influence on research outcomes.


Sign in / Sign up

Export Citation Format

Share Document